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Abstract 

This study considers the availability of room opportunities collected from a 
Japanese hotel booking site. We empirically analyze the daily number of room 
opportunities for four areas. To determine the migration trends of travelers, we 
discuss a finite mixture of Poisson distributions and the EM-algorithm as its 
parameter estimation method. We further propose a method to infer the prob- 
ability of opportunities existing for each observation. We characterize demand- 
supply situations by means of relationship between the averaged room prices 
and the probability of opportunity existing. 

Keywords: Japanese Hotel Reservation, Mixture of Poisson distributions, 
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1. Introduction 

Recent technological development enables us to purchase various kinds of 
items and services via E-commerce systems. The emergence of Internet applica- 
tions has had an unprecedented impact on our life style to purchase goods and 
services. From available data of items and services at E-commerce platforms, 
we may expect that utilities of agents in socio-economic systems are directly 
estimated. 

Such an impact on travel an d tourism, specifically, on hotel room reserva- 
tions, is significantly considered (ILawl . l2009h . According to iPilial (120081 ). 40 per 



cent of hotel reservations will be made via Internet in 2008, up from 33 per cent 
in 2007 and 29 per cent in 2006. Therefore, the coverage of room opportunities 
via the Internet may be sufficient to provide statistically significant results, and 
to conduct a comprehensive analysis based on the hotel booking data collected 
from Internet booking sites. 

From our personal experience, it is found that it is becoming more popular 
to make reservations of hotels via the Internet. When we use a hotel booking 
site, we notice that we sometimes find preferable room opportunities or not. 
Namely, the hotel accessibility seems to be random. We further know that both 
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the date and place of stay are important factors to determine the availabihty 
of room opportunities. Hence, the room availabihty depends on the calendars 
(weekdays, weekends, and holidays) and regions. 

This availability of the hotel rooms may indicate the future migration trends 
of travelers. Therefore, it is worth considering accumulation of comprehensive 
data of hotel availability in order to detect inter-migration in countries. 

The migration processes have been intensively studied in the context of 
socio-economic dynamics with particular interest for quantitative research. Wei- 
dlich and Haag proposed the Master equation with transition probabilities 
depending on both regional-dependent and time-dependent utility and mo- 
bility i n order to describe colle ctive tendency of agent decision in migration 



chance (|Hagg fc Weidlich 1 . 119841) 



Since the motivation of migration seems to come from both psychological 
and physical factors, the understanding of the dynamics of the migration is 
expected to provide useful insights on inner states of agents and their collective 
behavior. 

In the present article, we discuss a model to capture behavior of consumers 
at a hotel booking site and investigate statistics of the number of available room 
opportunities from several perspectives. 

This article is organized as follows. In Sec. [21 we give a brief explanation 
of data description collected from a Japanese hotel booking site. In Sec. 121 
we show an outlook of collected data of the room opportunities. In Sec. SI 
we consider a model to capture room opportunities and derive a finite mixture 
of Poisson distributions from binomial processes. In Sec. [U we introduce the 
EM- algorithm to estimate parameters of the mixture of Poisson distributions. 
In Sec. ini we computed parameter estimates for an artificial data set generated 
from the mixture of Poisson distributions. In Sec. [71 wc show results of the 
empirical analysis on the room opportunities and discuss relationship between 
existing probabilities of opportunities and their rates. Sec. [8] is devoted to 
conclusions. 



2. Data description 

In this section, we give a brief explanation of a method to collect data on 
hotel availability. In this study, we used a Web API (Application Programing 
Interface) in order to collect the data from a Japanese hotel booking site named 
Jalan0. Jalan is one of the most popular hotel reservation services which provide 
a WebAPI in Japan. The API is an interface code set which is designed for a 
purpose to simplify development of application programs. 

Jalan Web service provides interfaces for both hotel managers and customers 
(see Fig. [l}. The mechanism of Jalan is as follows: The hotel managers can 
enter information on room opportunities served by their hotels via an Internet 
interface. The consumers can book rooms from available opportunities via the 



The data are provided by Jalan Web service. 



2 



Jalan Web site. The third parties can even buih their web services with the 
Jalan data by using the Web API. 



Customer 



Jalan 
Server 



Hotel 




Figure 1: A conceptual illustration of Jalan web service. The hotel managers enter information 
on rooms (plans) which will be served at the hotels. Customers can search and book rooms 
from all the available rooms (plans) via Jalan web page. 



We are coUecting all the available opportunities which appear in Jalan re- 
garding room opportunities in which two adults will be able to stay one night. 
The data are sampled from the Jalan net web site {http://www.jalan.net) daily. 
The data on room opportunities collected through Jalan Web API are stored as 
CSV files. 

In the data set, there exist over 100,000 room opportunities from over 14,000 
hotels. In Tab. [l] we show contents included in the data set. Each plan con- 
tains sampled date, stay date, regional sequential number, hotel identification 
number, hotel name, postal address, URL of the hotel web page, geographical 
position, plan name, and rate. 

Since the data contain regional information, it is possible for us to analyze 
regional dependence of hotel rates. Throughout the investigation, we regard the 
number of recorded opportunities (plan) as a proxy variable of the number of 
available room stocks. 

For this analysis, we used the data for the period from 24th Dec 2009 to 4th 
November 2010. Fig. [2] shows an example of distributions and representative 
rates. The yellow (black) filled squares represent hotel plans cost 50,000 JPY 
(1,000) JPY per night. The red filled squares represent hotel plans cost over 
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Table 1: The data format of room opportunities. 



Date of collection 
Date of Stay 

Hotel identification number 

Hotel name 

Hotel name (Kana) 

Postal code 

Address 

URL 

Latitude 

Longitude 

Opportunity name 

Meal availability 

the latest best rate 

Rate per night 



50,000 JPY per night. We found that there is strong dependence of opportunities 
on places. Specifically, we find that many hotels are located around several 
centralized cities such as Tokyo, Osaka, Nagoya, Fukuoka and so on. 

Fig. [2] (bottom) shows a probability density distribution on 15th April 2010 
all over the Japan. It is found that there are two peaks around 10,000 JPY and 
20,000 JPY on the probability density. 

3. Overview of the data 

The number of room opportunities in which two adults can stay is counted 
from the recorded csv files throughout the whole sampled period. Fig. [3] shows 
the daily number of room opportunities. From this graph, we found three facts: 

(1) There exists weekly fiuctuation for the number of available room opportu- 

nities. 

(2) There is a strong dependence of the number of available opportunities on 

the Japanese calendar. Namely, Saturdays and holidays drove reserva- 
tion activities of consumers. For example, during the New Year holidays 
(around 12/30-1/1) and holidays in the spring season (around 3/20), the 
time series of the numbers show big drops. 

(3) The number eventually increases as the date of stay reaches. Specifically, 

it is observed that the number of opportunities drastically decreases two 
days before the date of stay. 

Fig. 21 shows the number of available room opportunities at four regions 
for the period from 24th Dec 2009 to 4th November 2010. We calculated the 
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Figure 2: An example of rates distributions under the condition that two adults can stay at 
the hotel for one night at 15th April 2010 (Top). A probability density distribution of rates 
at 15th April 2010 (Bottom). This data have been sampled on 9th April 2010. Yellow (black) 
fillod squares represent hotel plans cost 50,000 JPY (1,000) JPY per night. Red filled squares 
represent hotel plans cost over 50,000 JPY per night. 
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Figure 3: The number of hotels in which two adults can stay one night for a period from 24th 
Dec 2009 to 4th November 2010. 



numbers at 010502 (Otaru), 072005 (Aizu-Kohgen, Yunogami, and Minami- 
Aizu), 136812 (Shiragane), and 171408 (Yuzawa). It is found that there are 
regional dependences of their temporal development. 

Furthermore, we show that dependence of averaged rates all over the Japan 
on calendar dates in Fig. [S] On the New Year holidays in 2010, it is confirmed 
that the averaged rates rapidly decrease, meanwhile, on the spring holidays in 
2010 the averaged rates rapidly increase. This difference seems to arise from 
the difference of consumers motivation structure and preference on price levels 
between these holiday seasons. 

Fig. [S] shows that the dependence of averaged rates at four regions on calen- 
dar dates. The tendency of averaged rates differs from each other. Specifically, 
the New Year holidays and the summer vacation season exhibit such difference. 
This means that demand-supply situations depend on regions. We need to know 
tendency of the demand-supply situations of each area in a rigorous manner. 



4. Model 

Let Nm and M be the total number of potential rooms at the area m and 
the total number of potential consumers. The total number of opportunities 
Njn niay be assumed to be constant since the Internet booking style has been 
sufficiently accepted, and almost hotels offer their rooms via the Internet. Ig- 
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Figure 4: The number of demand for four region per day. It is found that there exists regional 
dependence of their fluctuations. 



noring the birth-death process of consumers, we also assume that M should be 
be constant. 

We further assume that a Bernoulli random variable represents booking de- 
cision of a consumer from Nm kinds of room opportunities. In order to express 
the status of rooms within the observation period (one day), we introduce M 
Bernoulli random variables with time-dependent success probability p,n(t)^ 

, , r 1 w.p. Prn{t) (the i-th consumer holds a reservation) 

yrm( J Q ^ 2 — p^(t) (the i-th consumer does not holds a reservation) 

(1) 

where ymi{t) {i = 1, . . . , M) represents the status of the i-th consumer for a 
room at time t. 

If we assume that Pm{t) is sufficiently small, so that N,n > 'Y^^LiVmiit), 
then the number of room opportunities at time t may be proportional to the 
differences between the total number of potential rooms and the number of 
booked rooms at time t 

M 

y™(t) KiV™-^y™(t). (2) 

i=l 

Namely, we have 

M 

Zra{t) = kNm ' Y {t) = kJ^V^mit), (3) 

1=1 
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Figure 5: Time series of average rates of room opportunities on stay dates for four region. The 
mean value of rates is calculated from all the available room opportunities which are observed 
on each stay date. 



where fc is a positive constant. 

Assuming further that yi(t), . . . , hm (t) are independently, identically, dis- 
tributed, we obtain that the number of available opportunities X]i=i Umiit) fol- 
lows a binomial distribution Ji[M,rm{t)). Furthermore, assuming r„i <^ 1, 
AI ^ 1, Mr,n :3> 1, we can approximate the demand Z.„i{t) = kNm — ym{t) as 
a Poisson random variable, which follows 



Prz(^ 



^{t)) 



(t)} 



(4) 



where we define kpm{t) as rm{t). 

Since the agents have some interactions with one another, their psychological 
atmosphere (mood), which is collectively created by agents, influences their de- 
cision. Such a psychological effect may be expressed as probability fluctuations 
for success probability r,„(t) at time t in the Bernoulli random variable. 

Let us assume that the time-dependent probability rm{t) (0 < rm{t) < 1) 
is sampled from a probability density F,n.{r). From Eq. the marginal 

distribution for the Poisson distribution conditioning on r,„ with probability 
fluctuation F,„{r„i) is given by 



\ (r )^ 

J- ni\i ml ^1 



-A/r„ 



drr, 



(5) 



Since we can observe the number of available opportunities Zjn(t), we may es- 
timate parameters of the distribution Fm{r„i) from the successive observations. 
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Figure 6: Time series of average rates of room opportunities on stay dates for four region. The 
mean value of rates is calculated from all the available room opportunities which are observed 
on each stay date. 



For the sake of simplicity, wc further assume that r„j (t) is sampled from dis- 
crete categories r„i with probability (0 < r^^ < 1; i = 1, . . . , K,n;Y.f^ ami = 
1). These parameters are expected to describe motivation structure of con- 
sumers depending on calendar days (weekdays/weekends and special holidays, 
business purpose/recreation and so forth). Then since F„i{r„i) is given by 

Fm{rm) = X! ^rniS{r^ -Tmi), (6) 

Pi'Zm(^ = Zm) is calculated as 

= f:a„„^^^^^e-*^-". (7) 



i=l 



i=l 

Hence, Eq. ([7]) is concerned with a finite mixture of Poisson distributions. 

5. Estimation procedure by means of the EM algorithm 

The construction of estimators for finite mixtures of distributions has been 
considered in the literature of estimation. Estimation procedures for Poissonian 
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mixture model have been successively studied by several researchers. Specifically 
moment estimators and maximum likelihood estimators are intensively studied. 

The moment estimators were tried on a mixture of two normal distribu- 
tio ns by Karl Pe a son as ea r ly as 1894. Graphical so l utions have been given 
bv lCassie" (1954), Harding ( 1949f ) and Bhattacharva 1 1967 ). Rider discusses 
mixtures of binomial and mixtures of Poisson distributions in the case of two 



distributions ( Rider 1 . 11961 ) 



Hasselblad proposed the maximum likelihoo d estimator and derived recur- 
sive equations for parameters (jHasselblad 1 . 1 1 9691 ) . The effectiveness of the max- 
imum likelihood estimator for the mixtures of Poissonian distributions is widely 
recognized. Dempste r discusses the EM -algorithm for mixtures of distributions 



in the several cases (jDempster I . I1977I ). By using the EM-algorithm, we can 



obtain parameter estimates from mixing data. 

Let Zm{l^) , ■ ■ ■ 1 Zm(T) be the number of demand (the number of potential 
room opportunities minus the number of available room opportunities) com- 
puted at each observation day. From the observation sequences, let us consider 
a method to estimate parameters of Eq. ([7]) based on the maximum likelihood 
method. In this case, since the log-likelihood function can be described as 



K„ 



Lm (flml , • ■ • , 0,mK^ , ^ml , ■ • ■ , 1^mK„, ) — ^ log ttmi 



s=l i=l 



(.)! 



_ (8) 

parameter estimates are obtained by the maximization of the log-likelihood 
function L„(ami, . . . , amK^,r„ii . • . , rmK^) 

{dmi,---,amK,fmi,---,f,nK}= arg max Lr,i{a 

{a„„},{r„„} 

(9) 

under the constraint X^iJi ^mi = 1- 

The maximum likelihood estimator for the mixture of Poisson models given 
by Eq. ^ can be derived by setting the partial differentiations of Eq. ([8]) with 
respect to each parameter as zero (See [Appendix A| . They lead to the following 
recursive equations for parameters; 



a 



1 ^ 



t=l Gm^(Zm(i)) 



(i = I,..., Km), 



M 



{i = l,...,i^„). 



where 



K„ 



(10) 

(11) 

(12) 
(13) 



i=i 
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These recursive equations give us a way to estimate parameters by starting 
from an adequate set of initial values. These recursive equations are also referred 



to as the EM-al gorithm for the mixture of Poisson distributions (iDempster 
19771: lLiu1 . [2006l) . 



In order to determine the adequate number of parameters, we introduce the 
Akaike Information Criteria (AlC), which is defined as 

AICiKr,-,) = AK,n - 2Lra, (14) 

where Lm is the maximum value of the log-likelihood function in terms of 2Km 
parameter estimates. Lm is computed from the log- likelihood value per obser- 
vation with parameter estimates obtained from the EM-algorithm, 



K„ 

Lm. — 



El"g(E^"^' e (15) 



Since it is known that the preferred model should be the one with the lowest 
AIC value, we obtain the adequate number of categories K„i as 

Km = arg min AIC{Km)- (16) 

Furthermore, we consider the method to determine an underlying Poisson 
distribution from which the observation Zm{s) was sampled. Since the under- 
lying Poisson distribution is one of Poisson distributions for the mixture, its 
local likelihood function of Zm{s) may be maximized over all the local likelihood 
functions of Zm(s). Based on this idea we propose the following method. 

Let Rmi{z) (i = 1, . . . , Km) be log-likelihood functions of the i-th category 
at area m with parameter estimate fmi- From Eq. ([7]), it is defined as 

Rrrn (z) = Z log M + log f„„; - Mfmt log izl) . (17) 



By finding the maximum log-likelihood value Rmi{z{s)) for i = 1, . . . ,i^m, we 
can select the adequate distribution where Zm{s) was extracted. Namely, the 
adequate category is for Zm{s) should be given as 

is = arg maxi?j(zm(s)). (18) 



6. Numerical simulation 

Before going into empirical analysis on actual data on room opportunities 
with the proposed parameter estimation method, we calculate parameter esti- 
mates for artificial data with it. 

We generate the time series z{s) (s = 1, . . . , T) from a mixture of Poisson 
distributions, given by 

= Vi w.p. ai 

^ Pr(Z = Z(t)|r(t)) = (^^i^e-^^'-W ' ^^^^ 
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where K is the number of categories, represents the probability for the i-th 
category to appear {i = \, . . . ,K] Y^f^i ~ !)• 

We set K =12 and M 100, 000, 000. Using parameters shown in Tab. d 
we generated the artificial data shown in Fig. [71 Next, we estimated parameters 
from T{= 200) observations without any prier knowledge on the parameters. 

As shown in Fig. [5] (top) the AIC values with respect to K take the minima 
at _ftr = 12. In order to confirm adequacy of parameter estimates, we conduct 
Kolmogorov-Smirnov (KS) test between the artificial data and sequences of 
random numbers with parameter estimates. 

Fig. m (bottom) shows KS statistic at each K. Since at K = 12 the KS 
statistic is computed as 0.327, which is less than 1.36, the null hypothesis that 
these time series are sampled from the same distribution is not rejected at 5% 
significance level. Tab. [3] shows parameter estimates. 

Furthermore, we selected values of for each observation by means of the 
proposed method mentioned in Sec. [S] The parameter estimates can be com- 
puted as a function of time t. 

However, wc found differences between the parameter estimates and the true 
values. Especially, if the close true values of parameters were estimated as the 
same parameters. As a result, the number of categories is estimated as K = 12, 
which differs from the number of the true set of parameters a.s K = 12. 

After determining the underlying distribution for each observation, we fur- 
ther computed estimation errors between the parameter estimates and true pa- 
rameters for each observation. As shown in Fig. [9] we confirmed that their 
estimation error, defined as \fi{t) — ri{t)\ is less than 8.0 x 10^^, and that their 
relative error, defined as \ri{t) — ri{t)\/ri{t), is less than 0.3 %. It is confirmed 
that the parameter estimates by using the EM-algorithm agree with true values 
of parameters for the artificial time series. 

Hence, it is concluded that the discrimination errors between two close pa- 
rameters do not play a critical role for the purpose of the parameter identification 
at each observation. 



Table 2: Parameters of the Poissonian mixture model to generate artificial time series. The 
number of categories is set as K = 12. 



n 


0, 


.000025 


ai 


0.109726 


r2 


0, 


.000223 


a2 


0.070612 


rs 


0, 


.000280 


aa 


0.073355 


ta 


0, 


.000479 




0.077612 


r5 


0, 


.000613 


as 


0.094848 


re 


0, 


.000652 


0-6 


0.073841 




0, 


.001219 


a? 


0.090867 


rs 


0, 


.001233 


Og 


0.062191 


rg 


0, 


.001295 




0.077662 


rio 


0, 


,001341 


aio 0.102573 


rii 


0, 


,001412 


au 


_ 0.085892 


ri2 


0, 


,001570 


ai2 0.080821 
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Figure 7: Examples of time scries generated from the Poissonian mixture model for K = 12 
and M = 100, 000, 000. 

7. Empirical results and discussion 

In this section, we apply the proposed method to estimating parameters for 
actual data. We estimate the parameters ami and rmi from the numbers, which 
is shown in Fig. |3]at four regions with the log-hkehhood functions given in Eq. 
®. 

Fig. |4] shows estimated time series of the demand from 24th Dec 2009 to 
4th November 2010. In order to obtain this demand, we assume that Nm = 
uiaxt{Zm(t) + 10} and that M is approximately equivalent to the total popu- 
lation of Japan, so that M = 1, 000, 000, 000. 

According to the values of AIC as shown in Fig. [TU] (left), the adequate 
number of parameters is estimated as isT = 12 (010502), K ~ 10 (072005), 
K = 5 (136812), and K = 11 (171408), respectively. Fig. [10] (right) shows 
the KS value at each K. It is found that the KS test approves the mixture of 
Poisson distribution with parameter estimates in statistically significant. Tab. 
2] shows the AIC and KS values at the adequate number of parameters for each 
area. 

Secondly, we confirmed that the relationship between mean of room rates 
and the number of opportunities (left) and that between existing probabilities 
r„ii (right) for each day. Fig. [TT] shows their scatter plots during the periods 
of 25th December 2009 and 4th November 2010. Each point represents their 
relation for each day. The variance of room rates proportional to the existence 
probability. It is confirmed that the mean of room rates for two adults per 
night is about 20,000 JPY. This means that the excess supply increases the 
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Figure 8: The value of AIC for artificial data is shown as a function in terms of the number of 
parameters K (left). The lowest value of AIC is found as 3803.20 at K = 12. The KS statistic 
between the artificial data and estimated ones at each K (right). 




Figure 9: Each Parameter estimate for an observation is shown as a function in terms of time 
(left). The relative error between the true parameter and the estimated one (right). 



uncertainty of room rates. 

Thirdly, by means of the method to select the underlying distribution, from 
Poisson distributions for the mixture, we determined the category i for each 
day. As shown in Fig. [1^ (bottom), the probabilities show strong dependence 
on the Japanese calendar. 

It is confirmed that there is both regional and temporal dependence of the 
probabilities. We found that the probabilities take higher values at each region 
on holidays and weekends (Saturday). It is observed that higher probabilities 
maintained in winter season at 072005 (Aizu-Kohgen,Yunogami,Minami-Aizu). 
This reason is because this place is one of winter ski resorts. 

Specifically, on holidays and Saturdays, they take smaller values than on 
weekdays. Tabs. [5] and |6| show parameter estimates and exact dates included in 
each category at 010502 (Otaru), respectively. From this table we found that 
there is travel tendency of this region on seasons. 

It is found that a lot of travelers visited and the hotel rooms were actively 
booked at this area on dates included in categories 1 to 3. On the other hand. 
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Table 3: Parameter estimates of the Poissonian mixture model by using the EM estimator 
(bottom). The number of categories was estimated as K = 12 and its AIC value is obtained 
as AIC = 3803.20. 
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0, 


.0850000000 


rio 


0, 
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Table 4: At the adequate number of parameters, AIC, maximum log-likelihood (II), KS value, 
and p-value for eaeh region. 



regional number 


K 


AIC 


11 


p-value 


KS value 


010502 


12 


3558.31 


1756.15 


0.807 


0.532 


072005 


10 


3009.10 


1485.55 


0.187 


1.088 


136812 


5 


2572.33 


1277.17 


0.107 


1.245 


171408 


11 


3695.25 


1826.62 


0.465 


0.850 



this area were actively booked on dates included in categories 10 to 12. 

The covariates among the numbers of room opportunities at different regions 
are the important factors to determine the demand-supply situations all over 
the Japan. 

From Tab. [Sj it is confirmed that in the case of Otaru the end of October to 
the beginning of November in 2010 was highly demanded season. This tendency 
is different from the calendar dates. The relationship between the number of 
opportunities and averaged rates is slightly different from that between the 
existence probability and the averaged rates. By using our proposed method 
we can compare the difference of comsumers' demand between the dates. From 
the dependence of averaged prices on the probability r^, we can understand 
preference and motivation structure of consumers for travel and tourism. 

8. Conclusion 

We analyzed the data of room opportunities collected from a Japanese ho- 
tel booking site. We found that there is strong dependence of the number of 
available opportunities on the Japanese calendar. 

Firstly, We proposed a model of hotel booking activities based on a mixture 
of Poisson distributions with time-dependent intensity. From a binomial model 



15 



Table 5; Parameter estimates of the Poissonian mixture model by using the EM estimator. 
The number of categories were estimated as ii" = 12 at 010502. 
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0000001884 
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Figure 10; The value of AIC (left) and that of KS statistics (right) shown as a function in 
terms of the number of parameters K for four areas. 



with a time-dependent success probability, we derived the mixture of Poisson 
distributions. Based on the mixture model, we characterized the number of room 
opportunities at each day with different parameters regarding their difference 
as motivation structure of consumers dependent on the Japanese calendar. 

Secondarily, we proposed a parameter estimation method on the basis of the 
EM-algorithm and a method to select the underlying distribution for each ob- 
servation from Poisson distributions for the mixture through the maximization 
of the local log-likelihood value. 

Thirdly, we computed parameters for artificial time series generated from the 
mixture of Poisson distributions with the proposed method, and confirmed that 
the parameter estimates agree with the true values of parameters in statistically 
significant. We conducted an empirical analysis on the room opportunity data. 
We confirmed that the relationship between the averaged prices and the prob- 
abilities of opportunities existing is associated with demand-supply situations. 
Furthermore, we extracted multiple time series of the numbers at four regions 
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Table 6: The dates included in each category for 010502 (Otaru). 



1 


2010-10-26,2010-10-27,201-10-28,2010-11-01,2010-11-03,2010-11-04 


2 


2010-09-01,2010-09-26,2010-09-30,2010-10-05,2010-10-18,2010-10-25,2010-10- 
31,2010-11-02 


3 


2010-02-01,2010-02-02,2010-02-03,2010-04-19,2010-04-21,2010-04-22,2010-05- 
12,2010-05-19,2010-05-23,2010-05-24,2010-05-25,2010-05-30,2010-07-14,2010-07- 
15,2010-07-15,2010-07-22,2010-07-31,2010-08-31,2010-09-06,2010-10-12,2010-10- 
12,2010-10-29 


4 


2010-01-11,2010-01-12,2010-01-15,2010-01-24,2010-02-04,2010-03-15,2010-03- 

23,2010-04-12,2010-04-13,2010-04-14,2010-04-18,2010-04-25,2010-05-09,2010-05- 

11,2010-05-13,2010-05-18,2010-05-26,2010-06-01,2010-06-03,2010-06-16,2010-06- 

30,2010-07-01,2010-07-06,2010-07-26,2010-07-27,2010-08-30,2010-09-15,2010-09- 

16,2010-09-20,2010-09-28,2010-10-04,2010-10-17,2010-10-21 


5 


2010-01-06,2010-01-07,2010-01-08,2010-01-13,2010-01-22,2010-01-29,2010-02- 

22,2010-03-01,2010-03-02,2010-03-03,2010-03-04,2010-03-07,2010-03-08,2010-03- 

10,2010-03-11,2010-03-16,2010-03-17,2010-03-18,2010-03-22,2010-03-25,2010-03- 

30,2010-03-31,2010-04-01,2010-04-06,2010-04-07,2010-04-11,2010-04-15,2010-04- 

16,2010-04-28,2010-05-06,2010-05-07,2010-05-14,2010-05-16,2010-05-22,2010-06- 

04,2010-06-06,2010-06-07,2010-06-08,2010-06-10,2010-06-11,2010-06-17,2010-06- 

21,2010-06-22,2010-07-02,2010-07-20,2010-08-03,2010-08-04,2010-08-05,2010-08- 

17,2010-08-20,2010-08-24,2010-09-07,2010-09-08,2010-09-12,2010-09-21,2010-09- 

22,2010-10-08,2010-10-20 


6 


2010-01-28,2010-01-30,2010-02-18,2010-02-26,2010-02-28,2010-03-05,2010-03- 

09,2010-03-12,2010-03-19,2010-03-29,2010-04-04,2010-04-09,2010-04-29,2010-05- 

05,2010-05-08,2010-05-15,2010-05-17,2010-05-21,2010-05-27,2010-06-02,2010-06- 

18,2010-06-20,2010-06-23,2010-06-27,2010-06-28,2010-07-09,2010-07-11,2010-07- 

12,2010-07-29,2010-08-02,2010-08-06,2010-08-19,2010-08-23,2010-08-29,2010-09- 

05,2010-09-17,2010-09-23,2010-09-27,2010-09-29,2010-10-01,2010-10-02,2010-10-19 


7 


2010-01-04,2010-01-16,2010-01-23,2010-02-09,2010-02-15,2010-02-16,2010-02- 
19,2010-02-21,2010-02-24,2010-03-14,2010-03-28,2010-04-02,2010-04-03,2010-04- 
10,2010-04-24,2010-06-09,2010-06-24,2010-07-04,2010-07-16,2010-08-22,2010-09- 
03,2010-09-10,2010-09-14,2010-09-24,2010-10-22,2010-10-30 


8 


2009-12-24,2010-12-27,2010-12-28,2010-01-03,2010-01-05,2010-01-10,2010-02- 
07,2010-02-08,2010-02-10,2010-02-17,2010-02-27,2010-03-26,2010-04-05,2010-04- 
08,2010-06-25,2010-07-08,2010-07-23,2010-07-25,2010-08-01,2010-08-11,2010-08- 
15,2010-08-16,2010-08-18,2010-08-21,2010-08-25,2010-10-07,2010-10-15,2010-10-16 


9 


2009-12-25,2009-12-26,2009-12-29,2010-01-09,2010-01-21,2010-02-05,2010-02- 

11,2010-02-14,2010-03-13,2010-04-20,2010-04-30,2010-07-03,2010-07-10,2010-08- 

08,2010-08-09,2010-08-10,2010-08-12,2010-08-26,2010-08-27 


10 


2009-12-30,2010-01-01,2010-01-02,2010-01-19,2010-02-12,2010-02-20,2010-03- 

06,2010-03-21,2010-05-29,2010-06-05,2010-06-12,2010-06-19,2010-07-30,2010-08- 

07,2010-08-13,2010-08-14,2010-09-04,2010-09-11,2010-10-11,2010-10-23 


11 


2009-12-31,2010-02-06,2010-02-13,2010-03-20,2010-03-27,2010-05-01,2010-05- 

02,2010-05-03,2010-05-04,2010-06-26,2010-07-17,2010-07-18,2010-07-24,2010-07- 

31,2010-08-28,2010-09-18,2010-09-19,2010-09-25,2010-10-09,2010-10-10 


12 


2010-01-18,2010-01-20,2010-01-25,2010-01-27,2010-04-23,2010-04-26,2010-04- 
27,2010-05-20,2010-05-28,2010-05-31,2010-06-13,2010-06-14,2010-06-29,2010-07- 
05,2010-07-13,2010-07-19,2010-07-21,2010-07-28,2010-09-02,2010-09-09,2010-09- 
13,2010-10-03,2010-10-13,2010-10-14,2010-10-24 
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Figure 11: The relationship between mean of rates per night and the number of opportunities 
and the relationship between it and existence probability for a period from 25th December 
2009 to 4th Nov 2010 (left). Each point represents the relation on each observation day. 



and found that the migration trends of travelers seem to depend on regions. 

It was found that these large-scaled data on hotel opportunities enable us 
to see several invisible properties of travelers' behavior in Japan. 

As future work, we need to use more high-resolution data on booking of 
consumers at each hotel to capture demand-supply situations. If we can use 
such data, then we will be able to control room rates based on consumers' 
preference. A future emerging technology will make it possible to see or foresee 
something which we can not see at this moment. 
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Appendix A. Derivation of the EM-algorithm 

In this section, we mention a derivation of the EM algorithm from the max- 
imum likelihood estimation procedure. Let Gm{z) represent a mixture of K,n 
Poisson distributions Fmi{z) 



(i = 1, . . .,Kra) 



(A.1) 




(A.2) 
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where ami denote mixing ratios, which are normahzed as 



^a„„ = l. (A.3) 



i=i 



From observations {zm{t)} the log-hkelihood function in terms of the parameters 
(i = I,.. ., K,n) can be written as 

T 

)=^logG {t)). (A.4) 
t=i 

Inserting Eqs. (|A.ip and (|A.3P into Eq. \AA\ . we have 

t— 1 2—1 ^ ^ i—1 ^ ' 
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Partially differentiating Eq. (jA.5[) in terms of a,„i and r„ii, we obtain 

9Lm Fmi{z(t)) - FmK(Zm{t)) . . „ . , . 

T 

dL„i MT^ amiFmi{Zm{t)) f Zm{t) \ i \ / A -tN 



Multiplying a,„i by Eq. (|A.6[) and summing them over i we have 

T 



and multiplying ami/T by Eq. (|A.9p we obtain 



T 



From Eqs. ()A.7P and (jA.SP we immediately obtain 

(A.ll) 



1 Z^t=l ^"il^i G™(z„(t)) 



Therefore, if wc find an adequate set of initial values for parameters {a^/} 

,{0}- 

mi - 

and {r,^^'']} 



and {t'^/}, then wc can calculate parameters by using the update rule for {a|^]} 



1 Gm\zm{t)) 



^ E °7( (A.12) 



1 St=l ^m(i)" 



(.+1) _ 1 ^t-i -..^v-.c-TW^,^(,)) 



where 



F!:h^) = ^-^^^e-'^^'^l (A.14) 

GL^)(z) = EaLlFir)(.). (A.15) 

'i=i 

We compute these recursive equations by setting an arbitrary set of param- 
eters. Some of them are convergent and the others divergent. Therefore, it is 
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important for us to find an adequate set of initial values when we use the EM- 
algorithm given by Eqs. (|A.12p and (jA.13|) for estimation. To do so, we use a 
way to find a candidate of parameters in a stochastic manner. This procedure 

consists of three parts. 

The choice of initial values is based on Finch et al.'s algorithm iFinch l (ll989l) . 



Their idea is that, given the mixing proportion rmi, the s-th order statistics of 
observations Zm{ts) {s ~ l,...,r) are separated into Km parts. Each sub- 
block contains the i-th [aimT\ observations assumed to belong to the i-th com- 
ponent of the mixture. We compute mean of the i-th sub-block and use it 
as initial value = fimi/M. 

In the Monte Carlo step, we randomly allocate r,„i, . . . , rmK^ > where J2i=^ "'"mi 
1 and evaluate the log-likelihood function at the point. If the value of log- 
likelihood function at this point is finite, we choose this set of parameters as the 
new starting point for the recursive equation. 

Further setting the set of parameters as an initial condition, we recursively 
calculate the EM-algorithm until the log-likelihood value converges. If it is 
greater than the maximum value of log-likelihood function which has already 
obtained in the Monte Carlo step, then the set of parameters as a candidate of 
parameter estimates. 

Repeating this procedure until we can not find any points which improve 
the value of log-likelihood function in the Monte Carlo step, we estimate an 
adequate set of parameters. This algorithm is described as follows. 

(0) Set maxobj ~ and counter = 0. 



(1) Generate normalized random numbers as a'^i = Kni/ yJ^fjaKni 
using i.i.d. uniform random numbers b'^^. 

(2) r„ii are generated with Finch et al.'s algorithm from a'^^. If counter > 

MAX COUNT, then go to Step (6). 

(3) If Ljn{a'mi, ■ • ■ , 0'mKr„ ' ''ml 7 ■ ■ ■ i ''^'iuk) greater than maxobj, then we set 

maxobj as the value, '■— r'^i and ami ■= a'mi^ and go to Step (4). 
Otherwise go to Step (1). 

(4) From the starting point (ami • ■ • , Omif, ?'mi, • ■ • , rmx), compute Eqs. (fTO|) 

and ((TT|) recursively until the value of log-likelihood function converges. 

(5) If the maximum value of log-likelihood function in terms of the converged 

set of parameters is larger than maxobj, then set the value as maxobj and 
record the solution as a candidate of parameter estimates. 

(6) counter = counter + 1 and if counter < MAXCOUNT, then go to (1). 

Otherwise go to Step (7). 

(7) Stop this computer program and display maxobj and the recorded candi- 

date as parameter estimates. 
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